The target cost formulation in unit selection speech synthesis

نویسنده

  • Paul Taylor
چکیده

We review the various approaches that have been used to define the target cost in unit selection speech synthesis and show that there are a number of different and sometimes incompatible ways of defining this. We propose that this cost should be thought of as a measure of how similar two units sound to a human listener. We discuss the issue of what features should be used in unit selection and the pros and cons of using derived features such as F0. We then explore some algorithms used to calculate target costs and show that none are really ideal for the problem. Finally, we propose a new solution to this that uses a neural network to synthesise points in acoustic space around which we can build new clusters of units at run time. Index terms speech synthesis, unit selection, target cost, decision trees, neural networks

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

WFST Based Unit Selection for Concatenative Speech Synthesis in European Portuguese

The goal of the current work is to use Weighted Finite-State Transducers (WFSTs) to model the unit selection task, in a concatenative Text-to-Speech system. One of the major difficulties is the design of a perceptually meaningful cost function that weights and combines several features of the available inventory units, matching them to the target information. The WFST approach allows for great ...

متن کامل

Introducing visual target cost within an acoustic-visual unit-selection speech synthesizer

In this paper, we present a method to take into account visual information during the selection process in an acoustic-visual synthesizer. The acoustic-visual speech synthesizer is based on the selection and concatenation of synchronous bimodal diphone units i.e., speech signal and 3D facial movements of the speaker’s face. The visual speech information is acquired using a stereovision techniqu...

متن کامل

Symbolic vs. acoustics-based style control for expressive unit selection

The present paper addresses the issue of flexibility in expressive unit selection speech synthesis by using different style selection techniques. We select units from a mixed-style unit selection database, using either forced style switching, no control, symbolic target cost, or acoustic target cost as a style selection criterion. We assess the effect of selection technique, feature weight and ...

متن کامل

Unit Selection Algorithm Using Bi-grams Model For Corpus-Based Speech Synthesis

In this paper, we present a novel statistical approach to corpus-based speech synthesis. Classically, phonetic information is defined and considered as acoustic reference to be respected. In this way, many studies were elaborated for acoustical unit classification. This type of classification allows separating units according to their symbolic characteristics. Indeed, target cost and concatenat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006